Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents

نویسندگان

  • Irina Illina
  • Dominique Fohr
چکیده

Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. This model allows to better take into account lexical and semantic word relationships. In the context of broadcast news transcription and in terms of recall, experimental results show a good ability of the proposed model to select new relevant proper names.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous word representation using neural networks for proper name retrieval from diachronic documents

Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. One approach for increasing the vocabulary coverage of a speech transcription system is to automatically retrieve new proper names from contemporary diachronic text documents. In recent years, neural network...

متن کامل

Proper name retrieval from diachronic documents for automatic speech transcription using lexical and temporal context

Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assump...

متن کامل

How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first...

متن کامل

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model.   Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...

متن کامل

Neural Network Model of System for Information Retrieval from Text Documents in Slovak Language

The aim of the paper is to describe the information retrieval model which retrieves the information from the text documents in Slovak language and which, for this purpose, uses the neural networks. This model comes from linguistic and conceptual approach for the analysis of text documents in Slovak language. The neural network model, based on multilayer perceptron and spreading activation netwo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017